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(54) Propagating updates efficiently in hierarchically structured data 

(57) One embodiment of the present invention pro- 
vides a system that efficiently propagates changes in 

hierarchically organized data to remotely cached cop- 
ies(118) of the data. The system operates by receiving 
(302) an access to the data at a client. In response to 
this access, the system determines (304) if the client 
(106) contains a copy of the data. If so r the system 
sends a request (308) to a server (104) for an update 
(122) to the copy. The server (104) receives the request 
and determines differences (310) between the current 
version of the data (1 16) at the server and an older copy 
of the data (118) at the client, which the server has 
stored locally. These differences are used to construct 
(312) an update (122)for the copy of the data, which 
may include node insertion and node deletion opera- 
tions for hierarchically organized nudes in the data. 
Next, the update is sent (314) to the client (106) where 
it is applied to the copy of the data to produce an 
updated copy of the data. Finally, the original access is 
allowed (318) to proceed on the updated copy of the 
data. According to one aspect of the present invention, 
the act of determining differences (310), and the act of 
using the differences to construct (312) the update both 
take place during a single pass through the data. 
According to another aspect of the present invention, 
the update (122) for the copy of the data may include 
node copy, node move, node collapse and node splitting 
operations. 
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Description 



SUMMARY 



BACKGROUND 

[0001] The present invention relates to distributed 
computing systems and databases. More particularly, 
the present invention relates to a method and an appa- 
ratus that facilitates detecting changes in hierarchically 
structured data and producing corresponding updates 
for remote copies of the hierarchically structured data. 
[0002] The advent of the Internet has led to the 
development of web browsers that allow a user to navi- 
gate through inter-linked pages of textual data and 
graphical images distributed across geographically dis- 
tributed web servers. Unfortunately, as the Internet 
becomes increasingly popular, the Internet often experi- 
ences so much use that accesses from web browsers to 
web servers often slow to a crawl. 
[0003] In order to alleviate this problem, a copy of a 
portion of a web document from a web server (docu- 
ment server) can be cached on a client computer sys- 
tem, or alternatively, on an intermediate proxy server, so 
that an access to the portion of the document does not 
have to travel all the way back to the document server. 
Instead, the access can be serviced from a cached copy 
of the portion of the document located on the local com- 
puter system or on the proxy server. 
[0004] However, if the data on the document server 
is frequently updated, these updates must propagate to 
the cached copies on proxy servers and client computer 
systems. Such updates are presently propagated by 
simply sending a new copy of the data to the proxy serv- 
ers and client computer systems. However, this tech- 
nique is often inefficient because most of the data in the 
new copy is typically the same as the data in the cached 
copy. In this case, it would be more efficient to simply 
send changes to the data instead of sending a complete 
copy of the data. 

[0005] This is particularly true when the changes to 
the data involve simple manipulations in hierarchically 
structured data. Hierarchically structured data typically 
includes a collection of nodes containing data in a 
number of forms including textual data, database 
records, graphical data, and audio data. These nodes 
are typically inter-linked by pointers (or some other type 
of linkage) into a hierarchical structure, which has 
nodes that are subordinate to other nodes, such as a 
tree ~ although other types of linkages are possible. 
[0006] Manipulations of hierarchically structured 
data may take the form of operations on nodes, such as 
node insertions, node deletions or node movements. 
Although such operations can be succinctly stated and 
easily performed, there presently exists no mechanism 
to transmit such operations to update copies of the hier- 
archically structured data. Instead, existing systems first 
apply the operations to the data, and then transmit the 
data across the network to update copies of the data on 
local machines and proxy servers. 



[0007] One embodiment of the present invention 
provides a system that efficiently propagates changes 

5 in hierarchically organized data to remotely cached cop- 
ies of the data. The system operates by receiving an 
access to the data at a client. In response to this 
access, the system determines if the client contains a 
copy of the data. If so, the system sends a request to a 

10 server for an update to the copy. The server receives the 
request and determines differences between the cur- 
rent version of the data at the server and an older copy 
ol the data at the client, which the server has stored 
locally. These differences are used to construct an 

is update for the copy of the data, which may include node 
insertion and node deletion operations for hierarchically 
organized nodes in the data. Next, the update is sent to 
the client where it is applied to the copy of the data to 
produce an updated copy of the data. Finally, the origi- 

20 nal access is allowed to proceed on the updated copy of 
the data. According to one aspect of the present inven- 
tion, the act of determining differences, and the act of 
using the differences to construct the update both take 
place during a single pass through the data. According 

25 to another aspect of the present invention, the update 
for the copy of the data may include node copy, node 
move, node collapse and node splitting operations. 



BRIEF DESCRIPTION OF THE FIGURES 
[0008] 

FIG. 1 illustrates a computer system including a 
web browser and a web server in accordance with 
an embodiment of the present invention. 
FIG. 2 illustrates a computer system including a 
server that automatically updates local copies of 
documents in accordance with another embodi- 
ment of the present invention. 
FIG. 3 is a flow chart illustrating how a client 
requests an update from a server in accordance 
with an embodiment of the present invention. 
FIG. 4 is a flow chart illustrating how a sewer auto- 
matically updates local copies of documents in 
accordance with an embodiment of the present 
invention. 

FIG. 5 is a flow chart illustrating how the system 
creates updates for a new copy of hierarchically 
structured data in accordance with an embodiment 
of the present invention. 

FIGs. 6A-6I illustrate the steps involved in creating 
updates to transform a document tree T1 into a 
document tree T2. 



55 DETAILED DESCRIPTION 

[0009] The following description is presented to 
enable any person skilled in the art to make and use the 
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invention, and is provided in the context of a particular 
application and its requirements. Various modifications 
to the disclosed embodiments will be readily apparent to 
those skilled in the art, and the general principles 
defined herein may be applied to other embodiments 
and applications without departing from the scope of the 
present invention. Thus, the present invention is not 
intended to be limited to the embodiments shown, but is 
to be accorded the widest scope consistent with the 
principles and features disclosed herein. 
[0010] The data structures and code described in 
this detailed description are typically stored on a com- 
puter readable storage medium, which may be any 
device or medium that can store code and/or data for 
use by a computer system. This includes, but is not lim- 
ited to, magnetic and optical storage devices such as 
disk drives, magnetic tape, CDs (compact discs) and 
DVDs (digital video discs), and computer instruction sig- 
nals embodied in a carrier wave. For example, the car- 
rier wave may carry information across a 
communications network, such as the Internet. 

C o m p uter S y stem 

[0011] FIG. 1 illustrates a computer system includ- 
ing a web browser and a web server in accordance with 
an embodiment of the present invention. In the illus- 
trated embodiment, network 102 couples together 
server 104 and client 106. Network 102 generally refers 
to any type of wire or wireless link between computers, 
including, but not limited to, a local area network, a wide 
area network, or a combination of networks. In one 
embodiment of the present invention, network 102 
includes the Internet Server 104 may be any node cou- 
pled to network 1 02 that includes a mechanism for serv- 
icing requests from a client for computational or data 
storage resources. Client 106 may be any node coupled 
to network 102 that includes a mechanism for request- 
ing computational or data storage resources from 
server 104. 

[0012] Server 104 contains web server 1 12, which 
stores data for at least one web site in the form of inter- 
linked pages of textual and graphical information. Web 
server 1 12 additionally includes a mechanism to create 
updates for remotely cached copies of data from web 
server 112. 

[001 3] Web server 1 1 2 stores textual and graphical 
information related to various websites in document 
database 116. Document database 116 may exist in a 
number of locations and in a number of forms. In one 
embodiment of the present invention, database 116 
resides within the same computer system as server 
104. In another embodiment, document database 
resides at a remote location, and is accessed by server 
104 through network 102. Note that portions of docu- 
ment database 1 1 6 may reside in volatile or non-volatile 
semiconductor memory. Alternatively, portions of docu- 
ment database 116 may reside within rotating storage 



devices containing magnetic, optical or magneto-optical 
storage media. 

[0014] Client 106 includes web browser 1 14, which 
allows a user 110 viewing display 108 to navigate 
5 through various websites coupled to network 102. Web 
browser 114 stores cached copies 118 of portions of 
website documents in local storage on client 106. 
[0015] During operation the system illustrated in 
FIG. 1 operates generally as follows. In communicating 
10 with web browser 1 1 4, user 1 1 0 generates an access to 
a document in web server 112. In processing the 
access, web browser 1 1 4 first examines cached copies 
1 18 to determine if the access is directed to a portion of 
a web document that is already cached within client 
is 106. If so, client 106 makes an update request 120, 
which is transferred across network 102 to server 104. 
In response to the request, server 104 generates an 
update 122, which is transferred to web browser 114. 
Update 122 is then applied to the cached copies 1 18 in 
20 order to update cached copies 118. Finally, the access 
is allowed to proceed on the cached copies 1 18. 
[001 6] Note that although the example illustrated in 
FIG. 1 deals with web documents for use with web 
browsers and web servers, in general the present inven- 
ts tion can be applied to any type of data. This may include 
data stored in a hierarchical database. This may also 
include data related to a directory service that supports 
a hierarchical name space. 

[0017] Also, server 104 and web server 112 may 
30 actually be a proxy server that stores data in transit 
between a web server and web browser 114. In this 
case, the invention operates on communications 
between the proxy server and web browser 114. 
[0018] In a variation on the embodiment illustrated 
35 in FIG. 1 , client 1 06 is a "thin client" with limited memory 
space for storing cached copies of documents 118. In 
this variation, when client 106 requests a document, 
only a subset of the document that client 1 06 is actually 
viewing sent from server 104 to client 106. This subset 
40 is adaptively updated as client 106 navigates through 
the document. 

flX>19] In another variation on the above embodi- 
ment documents from document database 116 are 
tree- structured. In this variation, documents or portions 

45 of documents that are sent from server 1 04 to client 1 06 
are first validated to ensure that they specify a proper 
tree structure before they are sent to client 106. This 
eliminates the need for client 106 to validate the data. 
(Validation is typically performed by parsing the data, 

so constructing a tree from the data, and validating that the 
tree is properly structured.) Reducing this work on the 
client side can be particularly useful for thin clients, 
which may lack computing resources for performing 
such validation operations. 

55 [0020] FIG. 2 illustrates a computer system includ- 
ing a server that automatically updates local copies of 
documents in accordance with another embodiment of 
the present invention. In the embodiment illustrated in 
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FIG. 2, network 202 couples together server 204 with 
workstation 206, personal computer 208, network com- 
puter 210 and personal organizer 212. Network 202 
generally refers to any type of wire or wireless link 
between computers, including, but not limited to, a local 
area network, a wide area network, or a combination of 
networks. In one embodiment of the present invention, 
network 202 includes the Internet. Server 204 may be 
any node coupled to network 202 that includes a mech- 
anism for servicing requests from a client for computa- 
tional or data storage resources. Server 204 
communicates with a number of clients, including work- 
station 206, personal computer 208, network computer 
210 and personal organizer 212. In general, a client 
may include any node coupled to network 202 that con- 
tains a mechanism for requesting computational or data 
storage resources from server 204. Note that network 
computer 210 and personal organizer 212 are both "thin 
clients, " because they have rely on servers, such as 
server 204 for data storage and computational 
resources. Personal organizer 212 refers to any of a 
class of portable personal organizers containing com- 
putational and memory resources. For example, per- 
sonal organizer 212 might be a PALMPILOT™ 
distributed by the 3COM Corporation of Sunnyvale, Cal- 
ifornia. (PalmPilot is a trademark of the 3COM Corpora- 
tion). 

[0021] In the illustrated embodiment, workstation 

206, personal computer 208, network computer 210 
and personal organizer 212 contain cached documents 

207, 209, 211 and 213, respectively. Cached docu- 
ments 207, 209, 211 and 213 contain locally cached 
portions of documents from server 204. 

[0022] Server 204 is coupl ed to document database 
214, which includes documents to be distributed to cli- 
ents 206, 208, 210 and 212. Document database 214 
may exist in a number of locations and in a number of 
forms. In one embodiment of the present invention, doc- 
ument database 214 resides within the same computer 
system as server 204. In another embodiment docu- 
ment database resides at a remote location that is 
accessed by server 204 across network 202. Portions of 
document database 214 may reside in volatile or non- 
volatile semiconductor memory. Alternatively, portions 
of document database 214 may reside within rotating 
storage devices containing magnetic, optical or magne- 
tooptical storage media. 

[0023] Server 204 includes publishing code 205, 
which includes computer code that disseminates infor- 
mation across network 202 to workstation 206, personal 
computer 208, network computer 210 and personal 
organizer 212. Publishing code 205 includes a mecha- 
nism that automatically creates updates for locally 
cached copies of documents from document database 
214 stored in clients 206, 208, 210 and 212. 
[0024] During operation, the system illustrated in 
FIG. 2 operates generally as follows. Publishing code 
205 periodically receives new content 230, and uses 



new content 230 to update documents within document 
database 214. Publishing code also periodically con- 
structs updates for remotely cached copies of docu- 
ments from document database 214, and sends these 
5 updates to clients, such as workstation 206, personal 
computer 208, network computer 210 and personal 
organizer 212. Note that these updates do not simply 
contain new versions of cached documents, but rather 
specify changes to cached documents. 

10 

Updating Process 

[0025] FIG. 3 is a flow chart illustrating how a client 
requests an update from a server in accordance with an 

is embodiment of the present invention. This flow chart 
describes the operation of the invention with reference 
to the embodiment illustrated in FIG. 1 . First, the system 
receives a request access the data (step 302). In FIG. 1 , 
this corresponds to user 110 requesting access to a 

20 web page or a portion of a web page through web 
browser 1 14 on client 106. Next, the system determines 
if client 106 contains a copy of the data (step 304). This 
corresponds to web browser 114 looking in cached cop- 
ies 1 18 for the requested data. If the data is not present 

25 on client 106, the system simply sends a copy of the 
requested data from server 104 to client 106 (step 306), 
and this copy is stored in cached documents 1 1 8. 
[0026] If a copy of the data is present on client 1 06, 
client 106 sends an update request 120 to server 104 

30 requesting an update to the copy (step 308). In one 
embodiment of the present invention, update request 
120 includes a time stamp indicating how long ago the 
previous update to cached documents 1 18 was created. 
In response to update request 120, server 104 deter- 

35 mines differences between the copy of the data on client 
106, and the data from document database 116 (step 
310). These differences are used to construct an update 
122, which specifies operations to update the copy of 
the data on client 106 (step 312). Note that if client 106 

40 sends a timestamp along with the request in step 308, 
the timestamp can be used to determine the differences 
between the data on server 104 and the cached copy of 
the data on client 106. In another embodiment of the 
present invention, server 104 saves update 122, so that 

45 server 104 can send update 122 to other clients. In yet 
another embodiment server 104 keeps track of 
changes to the data from document database 116 as 
the changes occur; these changes are aggregated into 
update 122. This eliminates the need to actually find dif- 

so ferences between the data from document database 
1 16 and the cached copy of the data on client 106. 
[0027] Also note that the operations specified by 
update 122 may include manipulations of nodes with in 
the data. For example, if the data is hierarchically organ- 

55 ized as nodes in a tree structure, the update may spec- 
ify tree node manipulation operations, such as move, 
swap, copy, insert and delete operations for leaf nodes. 
The data may also specify sub-tree move, copy, swap, 
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insert, delete operations, as well as internal node split- 
ting and internal node collapsing operations. Transmit- 
ting such node manipulation operations, instead of 
transmitting the data that results after the node manipu- 
lation operations have been applied to the data, can 
greatly reduce the amount of data that must be transmit- 
ted to update a copy of the data on client 106. 
[0028] The update may additionally include a Multi- 
purpose Internet Mail Extensions (MIME) content type 
specifying that the update contains updating operations 
for hierarchically organized data. This informs a client 
receiving update 122 that update 122 contains update 
information, and not regular data. The MIME content 
type may specify that update 122 contains updating 
information that has been validated by server 104 so 
that client 106 does not have to validate update 122. 
[0029] In one embodiment of the present invention, 
the steps of determining the differences (step 310) and 
constructing the update 122 (step 312) take place con- 
currently during a single pass through the data. This 
technique has performance advantages over perform- 
ing these steps separately in two passes through the 
data. 

[0030] Next, update 122 is sent from sewer 104 to 
client 106 (step 314), and client 106 applies update 122 
to the copy of the data (step 316). In one embodiment of 
the present invention, the copy of the data is stored in 
semiconductor memory within client 106, and hence 
applying update 1 22 to the copy of the data involves fast 
memory operations, instead of slower disk access oper- 
ations. 

[0031] Finally, the original access to the data (from 
step 302) is allowed to proceed, so that user 1 1 0 can 
view the data on display 108. The above process is 
repeated for successive accesses to the copy of the 
data on client. 

[0032] Note that although the illustrated embodi- 
ment of the present invention operates in the context of 
a web browser and a web server, the present invention 
can be applied in any context where updates to data 
have to be propagated to copies of the data. For exam- 
ple, the present invention can be applied to distributed 
database systems. 

[0033] FIG. 4 is a flow chart illustrating how sewer 
204 (from FIG. 2) automatically updates local copies of 
documents in accordance with an embodiment of the 
present invention. This embodiment is an implementa- 
tion of a "push" model, in which data is pushed from a 
server 204 to clients 206, 208, 210 and 212 without the 
clients having to ask for the data. This differs from a 
"request" model, in which the clients have to explicitly 
request data before it is sent as is illustrated in FIG. 1. 
[0034] The flow chart illustrated in FIG. 4 describes 
the operation of the invention with reference to the 
embodiment illustrated in FIG. 2. First, server 204 
receives new content 230 (step 402). This new content 
230 may take the form of live updates to document data- 
base 214, for example in the form of stock pricing infor- 



mation. New content 230 is used to update documents 
or other data objects within document database 214 on 
server 204 (step 404). 

[0035] Next, publishing code 205 within server 204 
5 determines differences between the data in document 
database 214 and copies of the data on clients (sub- 
scribers) 206, 208, 210 and 212, (step 406). These dif- 
ferences are used to construct updates 216, 218, 220 
and 222, which specify operations to change copies of 
10 the data on clients 206, 208, 210 and 212, respectively 
(step 408). 

[0036] Updates 216, 218, 220 and 222 may specify 
operations that manipulate nodes within the data. For 
example, if the data is hierarchically organized as nodes 

is in a tree structure, updates 216, 218, 220 and 222 may 
specify tree node manipulation operations, such as 
move, swap, copy, insert and delete operations for leaf 
nodes. The data may also specify sub-tree move, copy, 
swap, insert, delete operations, as well as internal node 

20 splitting and internal node collapsing operations. Trans- 
mitting such node manipulation operations, instead of 
transmitting the data after node manipulation operations 
have been applied to it, can greatly reduce the amount 
of data that must be transmitted to update copies of the 

25 data on clients 206, 208, 210 and 212. 

[0037] In one embodiment of the present invention, 
the steps of determining the differences (step 406) and 
of constructing updates 216, 218, 220 and 222 (step 
408) takes place concurrently during a single pass 

30 through the data. This can have a significant perform- 
ance advantage over performing these steps in two sep- 
arate passes through the data. 
[0038] Next, updates 216, 218, 220 and 222 are 
sent from server 204 to clients 206, 208, 210 and 212 

35 (step 410), respectively. Clients 206, 208, 210 and 212 
apply updates 21 6, 21 8, 220 and 222 to their local cop- 
ies of the data 207, 209, 21 1 and 213, respectively (step 
412). In one embodiment of the present invention, these 
updates are applied to are applied the local copies 207, 

40 209, 211 and 213 "in memory," without requiring disk 
accesses. This allows to updates to be performed very 
rapidly. 

[0039] The above process is periodically repeated 
by the system in order to keep copies of the data on cli- 
45 ents 206, 208, 210 and 212 at least partially consistent 
with the data on server 204. This updating process may 
repeated at any time interval from, for example, several 
seconds to many days. 

so Process of Creating Updates 

[0040] FIG. 5 is a flow chart illustrating how the sys- 
tem creates updates at the server for a new copy of hier- 
archically structured data in accordance with an 
55 embodiment of the present invention. This embodiment 
assumes that the data is hierarchically organized as a 
collection of nodes in a tree structure. This tree struc- 
ture includes a root node that can have a number of chil- 
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dren. These children can also have children, and so on, 
until leaf nodes, which have no children, are reached. 
Note that the below-described process for creating 
updates requires only a single pass through the data. 
During this single pass the system determines differ- s 
ences between old and new trees and creates corre- 
sponding updates to convert the old tree into the new 
tree. This eliminates the need for a separate time-con- 
suming pass through the data to create updates from 
differences. 10 
[0041] The system starts with an old tree (old J) 
and a new tree (newj). The system first matches leaf 
nodes of old_t and newj (step 502). In doing so, the 
system may look for exact matches or partial matches of 
the data stored in the leaf nodes. In the case of partial 15 
matches, if the quality of a match is determined to be 
above a preset threshold, the leaf nodes are considered 
to be "matched." Next, the system generates deletion 
operations to remove nodes from old J which are not 
present in new_t (step 504). 20 
[0042] In the next phase, the system repeats a 
number of steps (506, 508, 510 and 512) for ascending 
levels of the tree. First, for a given level, the system gen- 
erates node insertion operations for nodes that are 
present in newj but not in o!d_t (step 506). Also, if the 25 
position of a node in old J is different from the position 
of the same node in new_t, the system generates a 
move operation, to move the node from its position in 
old_t to its new position in new_t (step 508). Addition- 
ally, if a parent node in old_t does not have all of the 30 
same children in newj, the system generates a node 
split operation for the parent, splitting the parent node 
into a first parent and a second parent (step 510). The 
first parent inherits all of the children that are present in 
new_t, and the second parent inherits the remaining 35 
children. If a parent node in old J has all of the same 
children and additional children in newj, the system 
generates a node collapse operation to bring all the chil- 
dren together in newj (step 512). 
[0043] Additionally, if all of the children of a first par- 40 
ent in oldj move to a second parent in newj, the sys- 
tem generates a node collapse operation to collapse the 
first parent into the second parent so that all of the chil- 
dren of the first parent are inherited by the second par- 
ent 45 
[0044] The system repeats the above-listed steps 
506, 508, 510 and 512 until the root of the tree is 
reached. At this point all of the operations that have 
been generated are assembled together to create an 
update that transforms old J into newj (step 514). so 

Example 

[0045] Let us consider the example tree illustrated 
in Figure 6A. This tree may represent a document con- ss 
sisting of sections, paragraphs and individual sentences 
containing parsable character data. Assume that the 
document grammar also allows documents to contain 



non-character data, say numeric data, as is represented 
by the leaf node identifier 'd\ All nodes in FIG. 6A 
include a name (tag), a value, and an associated value 
identifier Since the leaf nodes actually contain data, 
value identifiers are assigned to them before the proc- 
ess starts; whereas, for an internal node, a value identi- 
fier is assigned during the comparison process based 
upon the value of identifiers of the internal node's chil- 
dren. Note that in some embodiments of the present 
invention, the tree data structure as represented in 
memory may conform to the World Wide Web Consor- 
tium document object model (W3CDOM). 
[0046] Additionally, in some embodiments of the 
present invention, the hierarchically organized data 
includes data that conforms to the Extensible Markup 
Language (XML) standard. In other embodiments of the 
present invention, the hierarchically organized data 
includes data that conforms HyperText Markup Lan- 
guage (HTML) standard, and other markup language 
standards. 

Notatlonal Semantics 

[0047] We represent each leaf node by the path 
from root node to the leaf node containing the position 
of each node along the path. Hence, the notation for 
each of the leaf nodes in FIG. 6A is as follows: 

DO.SeO.PO.SO (left-most node) 

DO.Se0.PO.S1 

DO.Se0.PO.S2 

DO.SeO.PO.S3 

D0.Se0.P1 .SO 

D0.Se1.N0 

D0.Se2.P0.S0 

D0.Se2.P1.S0 

D0.Se2.P1.S1 

D0.Se2.P2.S0 

D0.Se2.P2.S1 (right-most node) 

The above notation is used to locate and represent any 
node in the tree, whether it be a leaf node or internal 
node. 

[0048] The notational semantics for each of the tree 
transformation operations is as follows: 

* MOV(D0.Se0.P0.S2, D0.Se2.P1.S0). In FIG. 6A, 
this operation moves the leaf node with value iden- 
tifier 'a'. Note that a similar operation can be used to 
represent a movement of an internal node. In the 
case of an internal node, the entire sub-tree moves. 
Thus, the movement of an individual node or a sub- 
tree can be an inter-parent move or an intra-parent 
move. 

* SWP(DO.SeO.P0.S2, D0.Se0.P0.S1). This opera- 
tion is permitted only in the case of nodes that 
share a common parent (i.e., intra-parent only). The 
operation swaps the position of the affected nodes, 
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under the common parent. In the case of internal 
nodes, entire sub-trees are swapped. 

* CPY(DO.Se0.P0, D0.Se2.P2). This operation repli- 
cates a node by making an identical copy on the 
node, tn the case of internal nodes, the entire sub- 
tree is copied. 

* INS(D0.Se0P0.S0, a', {data}). This operation 
inserts a node in the tree at the given position and 
assigns to it a value identifier a' along with the { 
data}. In the case of an internal node, 
{data}assigned contains a null value. 

* DEL(DO.SeO.PO) This operation deletes a node and 
all of its children. 

* SPT(DO.SeO.PO, I) This operation splits a parent 
node into a first node and a second node. All of the 
children of the parent node starting at position I are 
transferred to the first node. The remaining children 
are transferred to the second node. The first node 
gets the same tag type as the original parent node. 

* CLP(DO.SeO.PO, D0.Se0.P1). This operation col- 
lapses the contents of a first node and a second 
node. The resulting node gets the same tag type as 
the first node. The children of the second become 
the right-most children of the resulting node. 

* UPD(D0.Se0.P0.S2, {delta}). This operation speci- 
fies a change{delta}to the contents of a leaf node. 
The {delta} itself describes how to apply (or merge) 
the change. 

[0049] Th example described below generates a 
set of operations to transform an old tree T1 (FIG. 6A) 
into a new tree T2 (FIG. 6B). Note that in this example 
the leaf nodes contain actual data, and the internal 
nodes simply contain tags which organize and describe 
the data. There are three phases in the process, includ- 
ing: (1) matching the leaf nodes in T1 and T2; (2) delet- 
ing nodes in T1 with no match in T2; and (3) modifying 
or moving nodes the remaining nodes to create T1 . 

Ph a se 1 : Mat ching Uaf Nodes 

[0050] The first step is to generate a unique identi- 
fier for each of the leaf nodes in T2 based on the content 
of the leaf node. This can be accomplished by using a 
hash function to generate a unique identifier for each of 
the leaf nodes. If two leaf nodes have the same content, 
then the hash function generates the same identifier. If 
two leaf nodes have the same identifier, it will not cause 
problems, because the process uses the root node to 
leaf node path to identify the individual nodes. 
[0051 ] Next, the process assigns value identifiers to 
leaf nodes of T1 . For a given leaf node in T1 , the proc- 
ess uses a hash function to generate a unique identifier, 
which matches one of the leaf node identifiers in T2. If 
the identifier generated does not match any of the iden- 
tifiers in T2, then process attempts to find a closest 
matching leaf node in T2, based on some matching cri- 
teria. For example, the process may use the Longest 



Common Sub-sequence (LCS) algorithm ("Data Struc- 
tures and Algorithms," Aho, Alfred V., Hopcroft, John E. 
and Ullman, Jeffrey D., Addison-Wesley, 1983, pp. 189- 
194) to determine a percentage match between the 

s contents of leaf nodes in T1 and T2. The matching crite- 
rion can be flexible. For example, the matching criterion 
may specify a minimum of 30% commonality in order for 
the leaf nodes to be matched. 
[0052] Allowing matches to be made on an accept- 

10 able matching criteria provides a measure of flexibility. 
In case a given leaf node's content has been only 
slightly modified in going from T1 to T2, the system sim- 
ply matches the node with its modified version in T2. 
The process subsequently makes the leaf nodes con- 

15 sistent through the UPD(node, delta) operation. How- 
ever, rf the commonality between leaf nodes being 
matched does not satisfy the matching criterion, the 
process assigns a unique value identifier to the leal 
node in T1 , which indicates that the leaf node has been 

20 deleted. 

[0053] In the worst case, the time complexity of find- 
ing a match between the leaf nodes will be OfK 2 ), where 
K is the number of unique leaf node identifiers in T1 and 
T2. In the best case, where the leaf nodes in T1 and T2 
25 match in a straightforward manner, the complexity will 
be 2*K. However, the number of changes in a document 
from one version to another is typically fairly small, in 
which case only a few leaf nodes need to be matched 
based on the weak matching criteria. 

30 

Phase 2: Deletion phase 

[0054] After the matching phase is complete, there 
may be some leaf nodes in T1 , which are not matched 
35 to nodes in T2. These unmatched are deleted as fol- 
lows. 

For unmatched leaf nodes in Tl (from left to right), 
create a delete operation, such as 

40 DEL(D0.Se2.P2.S0). 

Reduce the number of delete operations, by replac- 
ing them with sub-tree delete operations, if possi- 
ble. If all children belonging to a parent are to be 
deleted, the delete operation of each of the children 

45 can be replaced by a single delete operation of the 
parent node. This involves scanning the deletion 
list, looking for common parents. If T1 has K levels, 
at most K-1 scans are needed to identify a common 
parent for deletion. Notice that while scanning the 

so rth level, unreduced nodes in the i+1 level can be 
ignored, since they cannot be further reduced. 
* After the reductions are performed, the final dele- 
tion list is repositioned, because deleting a node at 
position '0' alters the relative positioning of adjacent 

55 nodes. Hence, if two delete operations are to be 
performed on nodes that have a common parent, 
then the second delete operation needs to be 
altered to reflect the change in position of the sec- 
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ond node to be deleted. 

[0055] In the instant example, leaf nodes y, t, h and 
i in FIG. 6A are unmatched. In accordance with the first 
step, the system creates following delete operations, 

DEL(DO.SeO.PO.SO), 
DEL(DO.SeO.PO.SI), 
DEL(D0.Se2.P2.S0), 
DEL(D0.Se2.P2.S1). 

In the second step, scanning left to right (scan level 4), 
the system notices that all of D0.Se2.P2's children are 
to be deleted. By reducing the individual delete opera- 
tions "DEL(D0.Se2.P2.S0) n and "DEL(D0.Se2.P2.S1) M 
into a single delete operation of the parent 
"DEL(D0.Se2.P2) w we are left the following delete oper- 
ations. 

DEL(DO.SeO.PO.SO), 
DEL(DO.SeO.PO.Sl), 
DEL(D0.Se2.P2). 

[0056] Continuing with the level 3 scan, the system 
notices that the only eligible delete operation for reduc- 
tion is DEL(D0.Se2.P2), since the other delete opera- 
tions DEL(D0.SeOPO.S0) and DEL(D0.Se0.P0.S1) are 
at level 4. Since D0.Se2.P2's parent has other children 
which do not participate in the delete operation, the 
reduction ends at scan level 3. 
[0057] In the third step, the system checks to see if 
applying the first delete operation will affect the relative 
node position of any other delete operation. This 
involves looking for nodes having the same parent as 
the node being deleted. If such a node exists, the sys- 
tem adjusts its node position accordingly. Note that the 
entire deletion list need not be scanned to identify sib- 
ling nodes, because the inherent ordering in the dele- 
tion list ensures that deletion operations for sibling 
nodes will be close together in the deletion list. 
[0058] Continuing with the example, the system 
notices that applying the delete operation 
DEL(DO.SeO.PO.yO) will affect the relative positioning of 
sibling node D0.Se0.P0.t1. So, the system adjusts the 
position of its sibling (See FIG. 6C). Hence, the final 
deletion list becomes, 

DEL(DO.SeO.PO.SO), 
DEL(DO.SeO.PO.SO), 
DEL(D0.Se2.P2), 

ph ^ 3; Mgfl ificatipn Phpse 

[0059] The modification phase brings together the 
children of internal nodes, in a bottom-up fashion. This 
involves scanning all the nodes from the bottom-most 
level (furthest from the root), and scanning each level 
until level zero is reached. Note that the identity of each 



internal node is established by the collective identity of 
its children. For example, if a parent node's children are 
identified as 'a' and b' respectively, then the identity of 
the parent is 'ab.' 

s [0060] Also, if a parent node is left with no children 
as a result of a move operation, the parent node is 
deleted. Furthermore, in the special case where there is 
a skewed tree or sub-tree of nodes having just one 
child, i.e., a->b->c->d, when node 'd* is deleted, node 'c' 

10 is also be deleted. This action is repeated until node 'a' 
is deleted as well. Instead of generating an individual 
delete operation for each one of the nodes, the chain of 
delete operations is reduced to a single delete operation 
of the grandest common parent of all nodes being 

15 deleted. 

[0061] Pseudo-code for one embodiment of the 
modification phase appears below. For each leveM in 
T2 (leaf to the root){ 

20 1 . TO_BE_COM P LETE DJJST = list of all the node 
value identifiers at level_i in T2. 

2. If the node in the TO_BE_COMP LETE DJJST is 
the root node, find the matching node T in TV. If T 
happens is a root node, break from the loop. Else, 

25 partition TV into two nodes, such that the sub-tree 
rooted at T is moved away from TV, and becomes 
another tree (TV). Next, delete the source partition 
(TV) by deleting its grandest common parent (the 
root). T1" and T2 are now identical. 

30 

) (end of for loop) 

3. Pick one of the nodes TC from 
TO_BE_COMPLETEDJJST, typically the left-most 

35 node. SIBLING JJST = siblings of K including k*. 
Note that we use the term 'node' in place of a node 
identifier, for convenience. 

4. If none of the nodes in the SfBLINGJJST have a 
40 matching node in TV, create a parent node *p' in 

T1 \ having the same tag type as the one in T2 (i.e. 
same as the parent of the nodes in the sibling list in 
T2). Insert all of the nodes in the sibling list into the 
newly created parent node in TV. Next, move the 
45 newly created node (along with its children) to be 
the child of any internal node, preferably, one of the 
parent nodes at levelj-2, if such a level exists. 

5. Let S be the subset of nodes in the 
so SIBLINGJJST that have a match in TV. Find a par- 
ent node 'p' in TV, which has the most siblings in 
the SIBLINGJJST. 

Move the rest of the matched nodes in S, to be 
55 the children of *p\ H any subset of nodes being 

moved have a common parent 'q\ and if 'q' has 
no other children, then collapse *q' into 'p\ Else, 
individual nodes are moved by separate move 
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operations. 

. The unmatched nodes in the SIBLINGJJST 
are inserted into *p\ 

Order the children of 'p' through swap opera- 
tions. At the end of the swaps, all the children of 
p' which do not happen to be the children of its 
peer, if any, are gathered in the right-most cor- 
ner. If there are such children, then a node split 
operation is performed, so that 'p' has exactly 
the same children as its peer. The newly cre- 
ated node (sub-tree) is at the same level as 'p' 
and has the same parent as 'p\ Also, the tag 
type of *p' is changed to be the same as its peer 
in T2, if it is different. 

6. Assign a node identity to *p\ which is the collec- 
tive identity of its latest children. Similarly, assign an 
identical identity to the peer node of 'p* in T2. 

7. TO_ BE_ COMPLETED. LIST = TO_ BE_ 
COMPLETED, LIST-SIBLINGJJST. 



[0067] Applying step 5, the system generates an 
identity for D0.Se2.P1 and its peer in T2 (see FIG. 6G). 
Though T2 is not shown, it is assumed that the identity 
has been assigned 

s [0068] Applying step 6, the system determines that 
TO_BE_COMPLETEDJJST = {f, e, b, a, z}. Since 
TQ_BE_COM P UETE D JJST is not empty, the system 
returns to step 2. SIBLING. LIST e {f}. Step 3 and 4 do 
not produce any changes. Step 5 assigns an identity to 

10 D0.Se2.P2. Step 6 removes T from 
TO_BE_COMPLETEDJJST. Repeating the same, the 
system eliminates 'd', and 'e' from 
TO_BE_COMPLETEDJ_IST. 

[0069] At this point, TO_BE_COMPLETED_LIST = 
is {b t a, z} and SIBLINGJJST = {b, a, z}. Step 4 selects 
node DO.SeO.PO as a matching node. At this point, 
node 'z* in the SIBLING. LIST is unmatched in T2. 
Hence, the system inserts node *z\ Next, the system 
applies swap operations to order the children of 
20 DO.SeO.PO. Now, TO_BE_COMPLETEDJJST is NULL 
(see FIG. 6H). 



8. If TO_BE_COMP LETE DJJST is not equal to 
NULL, then return to step 2, else continue. 

[0062] Note that the above node movement opera- 
tions cause changes in the relative positioning of sibling 
nodes. Hence, the node operations generated by the 
process should take into account the positional changes 
caused by node movements. 

[0063] The system now applies the modification 
algorithm on TV from FIG 6C. 

Level 3 scan 

[0064] Applying steps 1 and 2, 
TO_B E_COM P LETEDJJST = {g, c, f, e, b, a, z} and 
SIBLINGJJST = {g, c). The system locates the children 
'g' and *c' in TV, and chooses D0.Se2.Pl to be the par- 
ent. Applying step 4, the system notices that nodes 
D0.Se2.P1 and D0.Se0.P1 need to be collapsed. This 
brings together all the nodes in the SIBLINGJJST 
under a common parent (See FIG. 6D). 

CLP(D0.Se2.P1, D0.Se0.P1) 

[0065] Next, the system uses swap operations to 
re-order the nodes (see FIG. 6E), 

SWP(D0.Se2.P1.S1, D0.Se2.P1,S0) 
SWP(D0.Se2.P1.S2, D0.Se2.P1.S1) 

[0066] Next, a split operation is performed to move 
away children which do not truly belong to the parent 
(see FIG. 6F) 

SPT(D0.Se2.P1.2) 



INS(D0.Se0.P0.S0, z, {data}) 
SWP(D0.Se0.P0.S2, D0.SeO.P0.S0) 

25 

Level 2 scan 

[0070] Applying steps 1 and 2 the system deter- 
mines TO_B E_COM P LETE D_L I ST = {gc, f, e, baz} and 

so SIBLINGJJST = {gc ( f). Applying step 4, the system 
chooses D0.Se2 as the parent. The system next applies 
swap operations to order the children, and then split the 
parent D0.Se2 to move away children that do not belong 
to D0.Se2. Applying step 5, the system generates iden- 

35 tities for D0.Se2 and its peer in T2 (see FIG. 61). 

SWP(D0.Se2.P0, D0.Se2.P1) 
SWP(D0.Se2.P1, D0.Se2.P2) 
SPT(D0.Se2, 2) 

40 

[0071 ] Now, TO_BE_COMPLETED JJST = {e. baz) 
and SIBLINGJJST = {e, baz}. Applying step 4, the sys- 
tem chooses DO.SeO as the parent. Since, P(e) is the 
only child, the system collapses DO.SeO and D0.Se3, 
45 and then re-orders the children through swap opera- 
tions. Applying step 5, the system generates identities 
for DO.SeO and its peer in T2 (see FIG. 6J). 

CLP(D0.Se0, D0.Se3) 
so SWP(D0.Se0.P0, D0.Se0.P1) 

Level 1 scan 

[0072] Applying steps 1 and 2 the system deter- 
55 mines TO_BE__COMPLETED_LIST = {ebaz, d, gcf} and 
SIBLINGJJST B {ebaz, d, gcf}. Step 4 selects DO as 
the parent, and applies the swap operations to re-order 
its children, which produces T2 (see FIG 6K). 
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SWP(D0.Se0. DO.Sel) 

[0073] Hence, the final set of transformations to 
transform T1 to T2 is: 

DEL(DO.SeO.PO.SO), 
DEL(DO.SeO.PO.SO), 
DEL(D0.Se2.P2), 
CLP(D0.Se2.P1, DO.SeO.P1). 
SWP(D0.Se2.P1.S1, D0.Se2.P1.S0) t 
SWP(D0.Se2.P1.S2, D0.Se2.P1.S1), 
SPT(D0.Se2.P1, 2), 
INS(DO.SeO.PO.SO. z, {data}), 
SWP(D0.Se0.P0.S2, DO.SeO.PO.SO), 
SWP(D0.Se2.P0, D0.Se2.P1), 
SWP(D0.Se2.P1, D0.Se2.P2), 
SPT(D0.Se2.2), 
CLP(D0.Se0, D0.Se3), 
SWP(DO.SeO.P0, DO.SeO.P1), and 
SWP(DO.SeO, DO.Sel). 

[0074] Additionally, if partial matches of leaf nodes 
were made, the leaf nodes need to be updated using 
UPD operations. 

[0075] The above process requires all nodes in T2 
be visited and matched with corresponding nodes in T1 
once. The complexity of matching the internal nodes is 
0(n1+n2), where n1 and n2 are the internal node 
counts of T1 and T2, respectively. Note that nodes can 
be matched by hashing node value identifiers. 
[0076] Node movements and modifications also 
add to the overhead. If we consider a cost-based analy- 
sis, the cost of a transformation operation on a node T 
is a function of the number of children of T. Thus, the 
net cost of all transformations will be a function of the 
total number of nodes involved directly or indirectly in 
the transformation. 

[0077] Since there are no cycles in the transforming 
operations, the overhead contributed by the node move- 
ments is bounded by O(LK), where L is the number of 
levels in the tree, and K is a the number of leaf nodes. 
However, typically the number of nodes involved in the 
movements is very small and does not involve all the 
nodes in a tree. 

[0078] Hence, the worst case time complexity of the 
algorithm is a summation of the cost of matching leaf 
nodes OfK 2 ), the cost of matching internal nodes 
0(n1+n2), and overhead contributed by node move- 
ments O(LK). In an average case analysis, where the 
number of changes to a document are less than, for 
example, 20%, the time complexity is a summation of, 
the cost of matching leaf nodes O(K), the cost of match- 
ing internal nodes 0(n1+n2), and overhead contributed 
by node movements O(K). 

Optimizations 

[0079] There exist a number of additional optimiza- 



tions that can be applied to the above process. 

* While trying to find a parent *p* in TV which has the 
most children in the SIBLINGJJST, if there is tie, 
5 choose a parent with the same tag-type as the one 
inT2. 

While re-ordering nodes within the same parent 
(intra-node movement) through swap operations, if 
the node being moved out is not in the 
io SIBLINGJJST, it can be directly moved to be the 
right-most child. 

While re-ordering nodes within the same parent 
(intra-node movement) through swap operations, if 
the node being moved out is in the SIBLINGJJST, 
is try to position the node being moved out through 
another swap operation. 

[0080] The foregoing descriptions of embodiments 
of the invention have been presented for purposes of 

20 illustration and description only. They are not intended 
to be exhaustive or to limit the invention to the forms dis- 
closed. Many modifications and variations will be appar- 
ent to practitioners skilled in the art. Accordingly, the 
above disclosure is not intended to limit the invention. 

25 The scope of the invention is defined by the appended 
claims. 

Claims 

30 1. A method for propagating changes in data that is 
hierarchically organized to a copy (1 18) of the data, 
comprising: 

receiving (302) , at a client (106), a request to 

35 access to the data; 

determining (304) if the client (106) contains 
the copy (1 18) of the data; 
if the client (106) contains the copy (118), 
sending (308) a request (120) to a server (104) 

40 for an update to the copy of the data; 

after the request (120) is sent to the server 
(104), receiving (314) from the server (104) an 
update (122) for the copy of the data, wherein 
the update may include node insertion and 

45 node deletion operations for hierarchically 

organized nodes in the data; 
applying (316) the update (120) to the copy of 
the data to produce an updated copy of the 
data; and 

so allowing (318) the requested access to the data 

to proceed. 

2. The method of claim 1, wherein the update (122) 
additionally includes at least one from the group of 

55 node move, node collapse, node split and node 
update operations. 

3. The method of claim 1 or claim 2, wherein the act of 
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sending (308) the request to the server (104) 
includes sending an indicator of when the copy of 
the data was last updated. 

4. The method of any one of claims 1 to 3, wherein the 
update includes a Multipurpose Internet Mail Exten- 
sions (MIME) content type specifying that the 
update (122) contains updating operations for hier- 
archically organized data. 

5. The method of any one of claims 1 to 4, wherein the 
act of applying (316) the update (122) to the copy 
(118) of the data takes place in semiconductor 
memory, whereby the update is able to proceed 
rapidly in the absence of time-consuming I/O oper- 
ations. 

6. A method for propagating changes in data that is 
hierarchically organized to a copy (1 18) of the data, 
comprising: 

receiving (308), at a server (104), a request for 
an update for the copy of the data (1 18) located 
on a client (106); 

in response to the request, determining differ- 
ences (310) between the data and the copy of 
the data; 

using the differences to construct (312) the 
update (122) for the copy of the data, wherein 
the update may include node insertion and 
node deletion operations for hierarchically 
organized nodes; and 

sending (314) the update (122) from the 
server(104) to the client (106). 

7. The method of claim 6, wherein the acts of deter- 
mining the differences (310) and constructing (312) 
the update (122) take place during a single pass 
through the data. 

8. The method of any one of claims 6 to 7, wherein: 

the act of sending (308) the request to the 
server (104) includes sending an indicator of 
when the copy of the data was last updated; 
and 

the act of determining differences (310) 
includes examining the indicator to determine 
how the data on the server has changed since 
the copy of the data was last updated. 

9. The method of any one of claims 6 to 8, wherein the 
update (122) includes a Multipurpose Internet Mail 
Extensions (MIME) content type specifying that the 
update contains updating operations for hierarchi- 
cally organized data. 

1 0. The method of any one of claims 6 to 9, wherein the 



update (122) may include one or more of the follow- 
ing: 

a) a node copy operation that makes an identi- 
5 cal copy of a node as well as any subtree of the 

node that may exist, 

b) a node move operation that moves a node to 
another location in a tree of hierarchically 
organized nodes, 

10 c) a node split operation that splits a node into 

two separate nodes, and divides any children 
of the node that may exist between the two 
separate nodes, 

d) a node collapse operation that collapses two 
is nodes into a single node, which inherits any 

children of the two nodes that may exist, 

e) a node deletion operation includes that 
deleting any nodes that are subordinate to the 
node, 

20 f) a node swap operation that swaps two nodes 

as well as any subtrees of the nodes that may 
exist. 

11. The method of any one of claims 6 to 10, wherein 
25 the data that is hierarchically organized includes 

data that conforms to one of the following: 

a) the HyperText Markup Language (HTML) 
standard, 

30 b) the Extensible Markup Language (XML) 

standard. 

12. The method of any one of claims 6 to 11, wherein 
the data that is hierarchically organized includes a 

35 hierarchical database. 

13. The method of any one of claims 6 to 12, wherein 
the data that is hierarchically organized includes a 
directory service that supports a hierarchical name 

40 space. 

14. The method of any one of claims 6 to 13, further 
comprising receiving (402) changes (230) to the 
data on the server (204) from an external source. 

45 

15. The method of any one of claims 6 to 14, wherein 
the copy of the data (118) located on the client 
(106) includes a copy of a subset of the data, which 
is preferably adaptively charged as required by the 

so client (106). 

16. The method of any one of claims 6 to 15, wherein 
the server (104) includes a proxy server for caching 
data in transit between a server (104) and a client 

55 (106). 

17. The method of any one of claims 6 to 16, wherein 
the update includes data that is validated at the 
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18, 



19. 



20. 



21. 



22. 



server. 

A computer readable storage medium storing 
instructions that when executed by a computer 
cause the computer to perform the method of any s 
one of claims 6 to 17. 

An apparatus that propagates changes in data that 
is hierarchically organized to a copy of the data, 
comprising: 10 

a receiving mechanism (112), on a server 
(104), that receives a request (120) for an 
update (122) for the copy of the data (118) 
located on a client (106); is 
an update creation mechanism (310, 312), on 
the server (104), that determines differences 
between the data and the copy of the data and 
uses the differences to create the update for 
the copy of the data, wherein the update may 20 
include node insertion and node deletion oper- 
ations for hierarchically organized nodes in the 
data; and 

a sending mechanism (314), on the server 
(104), that sends the update (122) from the 25 
server to the client 

The apparatus of claim 19, wherein the update cre- 
ation mechanism (310, 312) is configured to deter- 
mine differences and create the update during a 30 
single pass through the data. 

The apparatus of claim 19 or claim 20, wherein the 
update additionally includes at least one from the 
group of node move, node collapse, node split and 35 
node update operations. 

The apparatus of any one of claims 19 to 21, 
wherein the update creation mechanism (310, 312) 
is configured to determine when the copy of the 40 
data on the client was last updated, and to deter- 
mine how the data on the server (1 04) has changed 
since the copy of the data (1 1 8) was last updated. 



an update to a server (104); 
an update receiving mechanism (314), on the 
client, that receives an update for the copy of 
the data from the server, wherein the update 
may include node insertion and node deletion 
operations for hierarchically organized nodes in 
the data; and 

an updating mechanism (316, 318), on the cli- 
ent, that applies the update to the copy of the 
data on the client to produce an updated copy 
of the data. 

25. A method for propagating changes in data, the data 
being organized isomorphically to a hierarchy hav- 
ing a plurality of nodes, the method comprising: 

receiving (308) at a server (104) a request for 
an update (122) to a copy of the data (1 1 8); 
responsively to the request, determining (310) 
at the server (104) a set of one or more differ- 
ences between the copy and a preferred ver- 
sion of the data, the server (104) having access 
to the copy and the preferred version at least 
for purposes of determining the set of differ- 
ences; 

using the set of differences to construct (312) 
an update (122) for the copy, the update (122) 
being suitable to conform the copy to the pre- 
ferred version of the data, the update including 
at least one operation selected from the group 
of inserting a node in the data hierarchy or 
deleting a node from the data hierarchy; and 
making the update thus constructed available 
for further use. 

26. The method of claim 25, wherein the acts of deter- 
mining the differences (310) and constructing (312) 
the update take place during a single pass through 
the data. 



23. The apparatus of any one of claims 19 to 22, 45 
wherein the update includes a Multipurpose Inter- 
net Mail Extensions (MIME) content type specifying 
that the update contains updating operations for 
hierarchically organized data. 

50 

24. An apparatus that propagates changes in data that 
is hierarchically organized to a copy of the data 
(118), comprising: 

a request generation mechanism (302, 304, 55 
308), on a client, that receives an access to the 
data (118), determines if the client contains the 
copy of the data, and if so, sends a request for 
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vides a system that efficiently propagates changes in 
hierarchically organized data to remotely cached cop- 
ies(1 1 8) of the data. The system operates by receiving 
(302) an access to the data at a client. In response to 
this access, the system determines (304) if the client 
(106) contains a copy of the data. If so, the system 
sends a request (308) to a server (104) for an update 
(122) to the copy. The server (1 04) receives the request 
and determines differences (310) between the current 
version of the data (1 1 6) at the server and an older copy 
of the data (1 1 8) at the client, which the server has 
stored locally. These differences are used to construct 
(312) an update (122)for the copy of the data, which 
may include node insertion and node deletion opera- 



tions for hierarchically organized nudes in the data. 
Next, the update is sent (314) to the client (106) where 
it is applied to the copy of the data to produce an 
updated copy of the data. Finally, the original access is 
allowed (318) to proceed on the updated copy of the 
data. According to one aspect of the present invention, 
the act of determining differences (310), and the act of 
using the differences to construct (312) the update both 
take place during a single pass through the data. 
According to another aspect of the present invention, 
the update (122) for the copy of the data may include 
node copy, node move, node collapse and node splitting 
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